Efficient Remote Homology Detection with Secondary Structure
نویسندگان
چکیده
Motivation: The function of an unknown biological sequence can often be accurately inferred if we are able to map this unknown sequence to its corresponding homologous family. Currently, discriminative approach which combines support vector machine and sequence similarity is recognized as the most accurate approach. SVM-Fisher and SVM-pairwise methods are two representatives of this approach, and SVM-pairwise is the most accurate method. However, these methods only encode sequence information into their feature vectors and ignore the structure information. In addition, one of their major drawbacks is their computation inefficiency. Based on this observation, we present an alternative method for SVMbased protein classification. Our method, SVM-I-sites, uses structure similarity instead of sequence similarity for remote homology detection. Our studies show that SVM-I-sites is much more efficient than both SVM-Fisher and SVMpairwise while achieving a comparable performance with SVM-pairwise. Result: We adopt SCOP 1.53 as our dataset. The result shows that SVM-I-sites runs much faster and is able to outperform many state-of-the-art sequence-based methods such as PSI-BLAST, SAM and SVM-Fisher, and comparable to SVM-pairwise. Availability: I-sites server is accessible through the web at http://www.bioinfo.rpi.edu. Programs are available upon request for academics. Licensing agreements are available for commercial interests. The framework of encoding local structure into feature vector is available upon request. Contact:[email protected], [email protected]
منابع مشابه
Length Encoded Secondary Structure Profile for Remote Homologous Protein Detection
Protein data has an explosive increasing rate both in volume and diversity, yet many of its structures remain unresolved, as well their functions remain to be identified. The conventional sequence alignment tools are insufficient in remote homology detection, while the current structural alignment tools would encounter the difficulties for proteins of unresolved structure. Here, we aimed to ove...
متن کاملProtein Remote Homology Detection Based on Binary Profiles
Remote homology detection is a key element of protein structure and function analysis in computational and experimental biology. This paper presents a simple representation of protein sequences, which uses the evolutionary information of profiles for efficient remote homology detection. The frequency profiles are directly calculated from the multiple sequence alignments outputted by PSI-BLAST a...
متن کاملIncorporating homologues into Sequence Embeddings for protein Analysis
Statistical and learning techniques are becoming increasingly popular for different tasks in bioinformatics. Many of the most powerful statistical and learning techniques are applicable to points in a Euclidean space but not directly applicable to discrete sequences such as protein sequences. One way to apply these techniques to protein sequences is to embed the sequences into a Euclidean space...
متن کاملIn Silico and in Vitroinvestigations on cry4aand cry11atoxins of Bacillus thuringiensis var Israelensis
In the present study we attempted to correlate the structure and function of the cry11a (72 kDa) and cry4a (135 kDa) proteins of Bacillus thuringiensis var israelensis. Homology modeling and secondary structure predictions were done to locate most probable regions for finding helices or strands in these proteins. The JPRED (JPRED consensus secondary structure prediction server) secondary struct...
متن کاملIntroducing An Efficient Set of High Spatial Resolution Images of Urban Areas to Evaluate Building Detection Algorithms
The present work aims to introduce an efficient set of high spatial resolution (HSR) images in order to more fairly evaluate building detection algorithms. The introduced images are chosen from two recent HSR sensors (QuickBird and GeoEye-1) and based on several challenges of urban areas encountered in building detection such as diversity in building density, building dissociation, building sha...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003